Color picker 1 - Use this one for gradient scales or brewer scales (this is explained later in the notebook).
Color picker 2 - Use this to pick the hex number of a particular specific color.
Color picker 3 - Use this to pick the hex number of a particular specific color.
Essentially we are going to specify colors in two manners:
Fixed name: example "red", "green", "blue". Limited to a handful of colors.
Hexadecimal numbers: example "#fff5eb", "#7f2704". Each color has a unique hexadecimal identifier. Use the links above to look for a specific color.
In this couse I will mostly use the hexadecimal notation for the colors.
Discrete variables - scale_color_brewer() and scale_fill_brewer()
This command makes sense for color scales that follow discrete variables.
For figures in which we define the fill aesthetic (e.g. histograms, barplots, heatmaps, box-plots…), we need to use scale_fill_brewer().
For figures in which we define the color aesthetic (e.g. scatterplots, lineplots, density graphs…) we need to use scale_color_brewer().
Let us thus create a couple graphs that are colored following a discrete variable (clarity) and we will adjust the color scales throughout this section:
# Color following clarity, discrete variablep1 <-ggplot(diamonds, aes(x = carat, y = price, color = clarity)) +geom_point()p1
p2 <-ggplot(diamonds, aes(x = color, fill = clarity)) +geom_bar()p2
We will essentially have three options for the step color scales used for these discrete variables:
Sequential (18 palettes)
Qualitative (8 palettes)
Divergent (9 palattes)
Let us explain the logic behind them.
Palletes used by ggplot2 - RColorBrewer
The palletes used by ggplot2 come from the package RColorBrewer. There is no need for us to load this package, but it may be useful that you know this.
I am going to load the package now just to show the palletes contained in the package. These displays commands can be useful to you if you want to find specific color combinations.
library(RColorBrewer) # If you have not installed RColorBrewer install it prior# to running this command.# Display the sequential palettesdisplay.brewer.all(type="seq")
Sequential palettes are suited for ordered data that progresses from low to high. Colorwise, there is a progress from light to dark:
Light colors: low values
Dark colors: high values
The annotated image below summarizes this and provides the standard sequential palettes available in ggplot:
Sequential color brewer palettes
The colors in our graph use the variable clarity, which is a discrete measure of how clear the diamond is. The values obey a certain quality hiearchy (run ?diamonds and read about the variable):
Qualitative palettes do not imply magnitude differences between legend classes and are used to create visual differences between the classes.
Use this when you want to assign distinct colors to each value of the categorical without any particular ordering / hierarchy.
The annotated image below summarizes this and provides the standard qualitative palettes available in ggplot. It is followed by a ggplot example:
Qualitative palettes in ggplot
Let us, for example, apply the 6-th qualitative palette on our figure:
p1 +scale_color_brewer(type ="qual", palette =6)
p2 +scale_fill_brewer(type ="qual", palette =6)
How to choose a built-in palette in R
Option 1: specify palette type and paletter number
p1 +scale_color_brewer(type ="seq", palette =12)
Careful, if you specify an index beyond the number of palettes, you will get an error
# If you run this, you will get an errorp1 +scale_color_brewer(type ="seq", palette =19)
Option 2: specify directly the palette name
In this case further specifying the palette type has no effect. The palette name overrides the palette type.
p1 +scale_color_brewer(palette ="PuBuGn")
Manually define your own palettes.
For figures in which we define the fill aesthetic (e.g. histograms, barplots, heatmaps, box-plots…), we may use scale_fill_manual() to define our own color palettes specifying all the colors.
For figures in which we define the color aesthetic (e.g. scatterplots, lineplots, density graphs…) we may use scale_color_manual() to define our own color palettes specifying all the colors.
Resources such as this link can be used to select palettes that we may specify color by color.
Or you may directly use the colors of your choice.
As an exercise on how to do this, let us manually use a bar plot to print out the rainbow flag. I looked for the hexadecimal codes of the rainbow flag colors here.
Using these color codes (which define a palette of 6 colors), a bar plot and the function scale_fill_manual(), we may generate a rainbow flag such as the one below.
NOTE: this example is included just as an exercise/challenge on how to manually specify color variables and tweak graphs in ggplot2. You will never use ggplot2 for this in real life.
The code to generate the flag above is the following (please read carefully and understand the comments and the effect of every line):
# Named vector with the colors in hex notationrainbow_flag_c <-c("Life"="#E40303", "Healing"="#FF8C00", "Sunlight"="#FFED00", "Nature"="#008026", "Serenity"="#004dff", "Spirit"="#750787")# Manually created dataframedf <-tibble(# Defined as a factor variable to # ensure proper order of the colors# The argument levels must contain the colors# in the appropriate order.x =factor(names(rainbow_flag_c), levels =names(rainbow_flag_c)), # Constant value to print out a flag)y =c(10, 10, 10, 10, 10, 10) )bars_flag <-# Define the canvas for the bar plotggplot(df, aes(x, y, fill = x)) +# stat = identity used because we specified y in aes()# width adjusted to remove spacing graphsgeom_bar(stat ="identity",width =1 ) +# Manual color scale for our levels defined with a named vectorscale_fill_manual(values = rainbow_flag_c) +# Reverse the x-axis so that calors are printed in the proper order# after flipping the axes with coord_flip()# I included this command `a posteriori`, when I realized# it was necessary after checking the result of coord_flip()scale_x_discrete(limits=rev) +# Rotate coordinates to achieve flag effectcoord_flip() +# Remove both x and y labelsxlab("") +ylab("") +# Introduce theme modifications to set white background,# remove axes and remove the legendtheme(panel.background=element_rect(fill='white'), # Use white backgroundaxis.text.x=element_blank(), # remove text from x axisaxis.ticks.x=element_blank(), # remove ticks from x axisaxis.text.y=element_blank(), # remove text from y axisaxis.ticks.y=element_blank(), # remove ticks from y axislegend.position ="none"# remove legend ) +# Print the meaning of each of the colors in each of the barsgeom_text(label =names(rainbow_flag_c), # Sets label within each barcolor ="white", # Sets white color for the lettersposition =position_stack(vjust =0.5), # Center positionsize =10# Adjust size )bars_flag
For figures in which we define the fill aesthetic (e.g. histograms, barplots, heatmaps, box-plots…), we need to use either scale_fill_gradient() or scale_fill_gradient2().
Fir figures in which we define the color aesthetic (e.g. scatterplots, lineplots, density graphs…) we need to use either scale_color_gradient() or scale_color_gradient2().
Variables from low to high - scale_color_gradient() and scale_fill_gradient()
These commands are used for continuous variables that range from a low value to a high value. This would be the continuous counterpart of the sequential discrete palettes you may use in scale_color_brewer() (see that section previously on this very same notebook).
Let us first create a base graph which is colored following a continuous variable from low to high, in this case the variable price of the diamonds dataset. ggplot automatically uses a gradient scale for the variable.
# Color following price, continuous variablep3 <-ggplot(diamonds, aes(x = carat, y = price, color = price)) +geom_point()p3
We can change the color gradient used by ggplot by specifying the low and high end of the gradient. For example, if we use the colors with the hexadecimal codes "#E1FA72" and "#F46FEE" (we will see further down how to pick colors) we get:
p3 +scale_color_gradient(low ="#E1FA72", high ="#F46FEE")
Or for example:
p3 +scale_color_gradient(low ="red", high ="green")
How to pick sensible colors for scale_color_gradient() or scale_fill_gradient()
You can use this website to select scales that have been found to work well.
Open the website.
Select the maximum possible number of data classes.
Select sequential under nature of your data.
Pick a color scheme you like.
Specify the first color of that scheme as the low end of your gradient and the last color as the high end of your color gradient.
NOTE: if the first color results in a too clear low end for your gradient, pick the second or third color. The same applies if the last color is too dark.
The screenshot below signals the important points to consider. An example with ggplot follows.
Use colorbrewer2 to define a sensible color gradient
Example:
p3 +scale_color_gradient(low ="#f7fcfd", high ="#00441b")
Variables with a midpoint and two extremes (low-high): scale_color_gradient2() - scale_fill_greadient2()
These commands apply to continuous variables with a midpoint and two extremes. This would be the continuous counterpart of the divergent discrete color palettes you may use in scale_color_brewer() (see that section previously on this very same notebook).
How to pick sensible colors for a diverging two color gradient scale
You can use this website to select scales that have been found to work well.
Open the website.
Select diverging under nature of your data.
Select the maximum possible number of data classes.
Pick a color scheme you like.
Specify the first color of that scheme as the low end of your gradient, the mid color as the midpoint of your gradient and the last color as the high end of your color gradient.
The screenshot below signals the important points to consider. An example with ggplot follows.
Select colors for diverging color gradients using colorbrewer2
NOTE: if either the low or high end of your color gradient seem to be too intense, consider picking the previous step for both ends (reduce both the low and high end for the scale to remain symmetric). For example:
Select colors for diverging color gradients using colorbrewer2
Example: correlation matrix heatmap
A classical example for a variable with two opposite ends and a relevant midpoint is the correlation coefficient. We will therefore repeat the correlation matrix heatmap example given in notebook 02, but this time we will explain in detail the possible arguments to scale_fill_gradient2()
The code to obtain aux_corr is not explained here with the same level of detail as in notebook 2. Refer to that notebook for a very detailed explanation.
1. Obtain the aux_corr dataframe. See notebook 02, heatmap example 2 for further details:**
aux_corr <- diamonds %>%# Selects only variables of type "numeric" for which pearsons corr. coeff # makes senseselect(where(is.numeric)) %>%# Computes correlation matrix between the numerical variablescor() %>%# Turn matrix into long format dataframe reshape2::melt() %>%# Round values of the correlation coefficientsmutate(value =round(value, 2))aux_corr
Var1 Var2 value
1 carat carat 1.00
2 depth carat 0.03
3 table carat 0.18
4 price carat 0.92
5 x carat 0.98
6 y carat 0.95
7 z carat 0.95
8 carat depth 0.03
9 depth depth 1.00
10 table depth -0.30
11 price depth -0.01
12 x depth -0.03
13 y depth -0.03
14 z depth 0.09
15 carat table 0.18
16 depth table -0.30
17 table table 1.00
18 price table 0.13
19 x table 0.20
20 y table 0.18
21 z table 0.15
22 carat price 0.92
23 depth price -0.01
24 table price 0.13
25 price price 1.00
26 x price 0.88
27 y price 0.87
28 z price 0.86
29 carat x 0.98
30 depth x -0.03
31 table x 0.20
32 price x 0.88
33 x x 1.00
34 y x 0.97
35 z x 0.97
36 carat y 0.95
37 depth y -0.03
38 table y 0.18
39 price y 0.87
40 x y 0.97
41 y y 1.00
42 z y 0.95
43 carat z 0.95
44 depth z 0.09
45 table z 0.15
46 price z 0.86
47 x z 0.97
48 y z 0.95
49 z z 1.00
2. Generate the heatmap: here we will use scale_fill_gradient2() with specific arguments.
2.1 Store the graph with all the options except for the color in variable corr_heatmap.
We carry out this step to be able to try different color scales without the need to re-run all this code.
corr_heatmap <- aux_corr %>%# Create the canvas for the graphggplot(aes(x = Var1, y = Var2, fill = value)) +# Create the tiles and adjust the area # to ensure adequate spacing between themgeom_tile(aes(width =0.965, height =0.95)) +# Reverse the y-axis so that the 1s of the matrix are on the main# diagonal of the matrixscale_y_discrete(limits=rev) +# Print the corr values within the tilesgeom_text(aes(label=value)) +# Use white background as spacing within the tilestheme(panel.background=element_rect(fill='white')) +# Remove the x-axis and y-axis labelsylab("") +xlab("") +# print the titleggtitle("correlation matrix - diamonds dataset") corr_heatmap
The default color scale by geom_tile() seems inadequate. It does not have two clear ends to signal the range from -1 to 1 that the correlation coefficient may take.
2.2 Use scale_fill_gradient2() to introduce an appropriate gradient scale.
We will use the gradient scale selected in the screenshot of the website colorbrewer2 included before in this same notebook.
The meaning of all the arguments is clearly explained in the code below:
corr_heatmap +# Define a gradient color scale with two endsscale_fill_gradient2(# Specify low, mid and high colorslow ="#053061", mid ="#f7f7f7", high ="#67001f", # Numeric value of the midpoint in our scalemidpoint =0,# Numerical limits to map the color scale to.# We pick -1 and 1, the possible range of values# for a corr. coefficient.limit =c(-1, 1), # Title printed on top of the legendname ="Corr. Coeff" )
To my eyes the high end of this color scale looks too dark. Therefore I will pick the previous steps for both the low and high ends as indicated in the screenshots of the website colorbrewer2 included previously in this notebook:
corr_heatmap +# Define a gradient color scale with two ends# Further on this command on the notebook on graph customizationscale_fill_gradient2(# Specify low, mid and high colorslow ="#2166ac", mid ="#f7f7f7", high ="#b2182b", # Numeric value of the midpoint in our scalemidpoint =0,# Numerical limits to map the color scale to.# We pick -1 and 1, the possible ranges of values# for a corr. coefficient.limit =c(-1, 1), # Title printed on top of the legendname ="Corr. Coeff" )
This looks much nicer, at least in my estimation. Feel free to try your own scales!
More about colors and ggplot2
If you wish to further explore the topic of colors in ggplot, these are good places to start at. What we have seen in the notebook is more than enough for this course and to produce professional looking graphs.
We are not going to resort to it in this course, but it is good that you know of the existence of the package paletteer, since it offers a great amount of additional pallettes you may use.
Within ggplot you may specify a general theme among different standards to change the general appearance of the output produced. The default is theme_gray().
NOTE: within each theme, you may use the function theme() to make adjustments to the template as we have been doing (check the examples given in this and in other notebooks)
p1 <-ggplot(diamonds, aes(x = carat, y = price, color = price)) +geom_point()p1
Examples
# The default, changes nothingp1 +theme_gray()
# Removes the grey background and uses only bw elements for background and axesp1 +theme_bw()
p1 +theme_classic()
Complete list of themes
The complete list of themes is to be found here (or simply using autocomplete). The website includes an example for each. Below I simply list them:
theme_grey()
theme_bw()
theme_linedraw()
theme_light()
theme_dark()
theme_minimal()
theme_classic()
theme_void()
factor(): order of categorical variables in graphs.
Let us look again at the first example of a barplot we saw in notebook 1:
ggplot(diamonds, aes(x = color, fill = color)) +geom_bar()
We saw in the introductory session to the subject that, for this graphs, it is good practice to order the graphs in descending or ascending order.
At this point, the variable color either contains no order information or contains an order information that does not match what we need in our graph.
To introduce the order information we need to turn this variable into a factor variable for which we specify the order. This can be attained with the function factor(), using it within mutate(). We are going to do this in 2 steps
Step 1. Get the desired ordering for the categories
The first thing is to order the colors in the desired order. In this case from largest to smallest count. That is:
aux <- diamonds %>%# define color as the grouping variablegroup_by(color) %>%# Count the number of elements in each color groupsummarize(count =n() ) %>%# Arrange in descending order (from largest to smallest)arrange(desc(count))aux
# A tibble: 7 × 2
color count
<ord> <int>
1 G 11292
2 E 9797
3 F 9542
4 H 8304
5 D 6775
6 I 5422
7 J 2808
The column color of this aux dataframe now contains the colors in the appropriate order. We want to extract this color column and store it as a vector. Check the notebook on the fundamental dplyr and tibble operations to check how to do this:
# Extract the column color and store it as a vectorcolors_order <- aux %>%pull(color)
Step 2. Redefine the variable using factor() and mutate()
diamonds <- diamonds %>%mutate(color =factor(color, # the original variablelevels = colors_order) # here we specify the order )
If we now use the exact same code, the colors will be printed in the appropriate order
ggplot(diamonds, aes(x = color, fill = color)) +geom_bar()
Graph labels
Title
Option 1: ggtitle()
You may use ggtitle() to define a title for your graph
p1 +ggtitle("price vs carat")
Option 2: labs()
Using this you may also specify a subtitle
p1 +labs(title ="price vs carat",subtitle ="(Dataset > 50000 diamonds)", )
Axes labels
Option 1: xlab() and ylab()
You may use xlab() and ylab() to define the labels of the x and y axis
p1 +ggtitle("price vs carat") +xlab("x = carat") +ylab("y = price")
You can feed the following arguments to the theme() function to rotate and change the size and color of the axes labels:
p1 +theme(# Change size, color and rotate x-axis labelsaxis.text.x=element_text(size=20, color='green', angle =90),# y-axisaxis.text.y=element_text(size=15, color='red'), )
labs() for other graph labels
You may use the function labs() to include many more annotations on your figure.
The labels in the example below have been included for you to see the possibilities. It does not mean that it is best practice to include all of them always.
p1 +labs(title ="price vs carat",subtitle ="(Dataset > 50000 diamonds)",x ="x = carat",y ="y = price",caption ="Data from ggplot's Diamonds dataset", # Includes captiontag ="Figure 1", # includes tag on the figurecolour ="price of\nthe diamond"# adapts the title of the legend# \n is the newline character )
Saving ggplot figures
In this section we briefly explain how to save a ggplot image.
Imagine you have created a plot and stored it as p1 or some other variable name. For example:
p1 <-ggplot(diamonds, aes(x = carat, y = price, color = price)) +geom_point()p1
Option 1 - have a window pop-up asking for file location
We have stored the previous graph in the variable fig1. We may store this graph as an image file in our computer with the code below.
# Save the ggplot object to the chosen locationggsave(filename =file.choose(new =TRUE), plot = p1)
This will open a window so that you can select the folder where the file is to be stored. Once you are at the desired location, type a filename and an appropriate image extension (e.g. .png). In my case I am going to specify the filename fig1.png
Then click on open and the figure will save.
You may get a prompt saying that the file does not exist and asking you if you want to create it. In that case, answer yes.
Option 2 - setting up the working directory and filename
Step 1: Setting the working directory
The file will be saved in your current working directory. Hence the importance of setting the current working directory.
This link is a video I created for you on how to set up the working directory. It is part of a playlist on the fundamentals of RStudio.
Once the working directory has been set, you can check it has been properly changed with the following command
# RETURNS THE WORKING DIRECTORYgetwd()
Step 2: Save the file
Once you have set up the working directory, you can save the file with the following command
filename ="image_name.png"# Save the ggplot object to the chosen locationggsave(filename = filename, plot = p1)
You can change the extension to other image formats (.jpg, .svg) to get the output image in other formats.
The resulting file will be saved in the working directory you set up.
Changing the size of the saved image
ggplot adapts the size of the graph automatically to the output device. This means you can specify a larger height and width than the default ones if for some reason your visualization requires this.
To attain this, specify the arguments height, width and units when saving the figure. For example, the code below sets height = 7 inches and width = 7 inches:
# Save the ggplot object to the chosen locationggsave(filename =file.choose(new =TRUE), plot = p1, width =7,height =7, units ="in")
Important final note
ggplot offers many more options for graph customization. There is no way we are going to be able to cover all of it in class.
This is just an introduction on how to use of these options.
With this strong foundation you may now use your best friends Google or ChatGPT and adapt the code they return to you.
Alternatively, you may also explore this excellent book.